Closing the Gap: Improved Bounds on Optimal POMDP Solutions
نویسندگان
چکیده
POMDP algorithms have made significant progress in recent years by allowing practitioners to find good solutions to increasingly large problems. Most approaches (including point-based and policy iteration techniques) operate by refining a lower bound of the optimal value function. Several approaches (e.g., HSVI2, SARSOP, grid-based approaches and online forward search) also refine an upper bound. However, approximating the optimal value function by an upper bound is computationally expensive and therefore tightness is often sacrificed to improve efficiency (e.g., sawtooth approximation). In this paper, we describe a new approach to efficiently compute tighter bounds by i) conducting a prioritized breadth first search over the reachable beliefs, ii) propagating upper bound improvements with an augmented POMDP and iii) using exact linear programming (instead of the sawtooth approximation) for upper bound interpolation. As a result, we can represent the bounds more compactly and significantly reduce the gap between upper and lower bounds on several benchmark problems.
منابع مشابه
Numerical Formulation on Crack Closing Effect In Buckling Analysis of Edge-Cracked Columns
In this paper, buckling of simply supported column with an edge crack is investigated numerically and analytically. Four different scenarios of damage severities are applied to a column, open crack assumption and the effect of closing crack in stability of the column which depends on position and size of cracks, are numerically compared. Crack surfaces contact is modeled with GAP element using ...
متن کاملα-min: A Compact Approximate Solver For Finite-Horizon POMDPs
In many POMDP applications in computational sustainability, it is important that the computed policy have a simple description, so that it can be easily interpreted by stakeholders and decision makers. One measure of simplicity for POMDP value functions is the number of α-vectors required to represent the value function. Existing POMDP methods seek to optimize the accuracy of the value function...
متن کاملCost minimization in multi-commodity multi-mode generalized networks with time windows
Cost Minimization in Multi−Commodity, Multi−Mode Generalized Networks with Time Windows. (December 2005) Ping-Shun Chen, B.S., National Chiao Tung University, Taiwan; M.S., National Chiao Tung University, Taiwan Chair of Advisory Committee: Dr. Alberto Garcia-Diaz The purpose of this research is to develop a heuristic algorithm to minimize total costs in multi-commodity, multi-mode generalized ...
متن کاملA POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems
Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...
متن کاملImproved integrality gap upper bounds for TSP with distances one and two
We study the structure of solutions to linear programming formulations for the traveling salesperson problem (TSP). We perform a detailed analysis of the support of the subtour elimination linear programming relaxation, which leads to algorithms that find 2-matchings with few components in polynomial time. The number of components directly leads to integrality gap upper bounds for the TSP with ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011